AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal Grounding

# Multimodal Grounding

Kosmos 2 Patch14 24 Dup Ms
MIT
Kosmos-2 is a multimodal large language model capable of integrating visual information with language understanding to achieve image-to-text conversion and visual grounding tasks.
Image-to-Text Transformers
K
ishaangupta293
21
0
Kosmos 2 Patch14 224
MIT
Kosmos-2 is a multimodal large language model capable of understanding and generating text descriptions related to images, and establishing associations between text and image regions.
Image-to-Text Transformers
K
microsoft
171.99k
162
Kosmos 2 Patch14 224
Kosmos-2 is a multimodal large language model capable of grounding language models to real-world visual elements, supporting various vision-language tasks.
Image-to-Text Transformers
K
ydshieh
62
54
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase